AD706152 


"THE  EMPTY  COLUMN”  REVISITED 

William  J.  Wiswesser 
Fort  Detrick 
Frederick,  Md.  21701 

A  Chemical  Notation  that  Appeared  with  Computer 
Languages  in  1950 


Reprinted  with  permission  from  " Computers  and  Automation ",  April ,  1970,  oopy right  1970 
by  and  published  by  Berkeley  Enterprises,  815  Washington  St,,  Newtonville,  MA  02160 


"THE  EMPTY  COLUMN"  REVISITED 


William  J.  Wiswesser 
Fort  Dttrick 
Frederick,  Md.  21701 


A  Chemical  Notation  that  Appeared  with  Computer 
Languages  in  1950 


wnTE  SECT  ion 
Mff  SEETIO*  Ol 


V 


R  2J  I0K  MMUBlim  WOES 
ST?  *  '  AVAIL  Ui/K  SPECiAl] 


William  J.  Wiswester,  a  research  chemist  at  Fort  Detrick, 
Frederick.  Md.,  probably  is  best  known  as  the  inventoi  of  the 
Wiewetaer  Line  Notation  (WLN1,  which  "The  Empty 
Column"  parable  introduced  20  years  ago.  He  is  a  native 
Pennsylvanian,  graduated  from  Lehigh  University  in  5936, 
later  taught  chemical  engineering  courses  at  Cooper  Union, 
and  probably  created  the  WLN  as  a  hybrid  of  long-rooted 
interests  in  atomic  art,  molecular  structure,  history  of 
chemistry ,  end  information  thoory. 


'‘These  are  just  starting  examples  of  computer  benefits  that  the  chemical 
world  will  enjoy  when  more  manpower,  money  and  talented  attention  is 
devoted  to  this  20-year-old  chemical  notation  with  the  empty  columns.  “ 


The  parable  about  a  "New  Notation”  of  Long  Ago 
( Computers  and  Automation,  January  1970,  page  16)  has  a 
significance  that  was  not  fully  appreciated  when  it  was 
written  twenty  years  ago  -  that  this  imagined  rejection  of 
Arabic  numerals  by  users  of  Roman  numerals  may  have 
occurred  many  times  during  the  past  two  thousand  years! 
Medieval  merchants  were  jailed  if  they  were  caught  manipu¬ 
lating  "those  heathen  signs  and  symbols".  The  battle  lasted 
for  some  300  years,  because  official  examiners  -  like  the 
Roman  in  the  parable  -  just  did  not  see  how  the  positional 
Arabic  numeration  profoundly  simplified  all  mathematical 
operations. 

Martin  Gardner  gave  the  following  fascinating  back¬ 
ground  details  on  this  mathematical  blindness  in  the  Jan¬ 
uary  1970  issue  of  Scientific  American  (pages  124-125): 

Tor  more  than  15  centuries  the  Greeks  and 
Romans  and  then  Europeans  of  the  Middle  Ages  and 
early  Renaissance  calculated  on  devices  with  authen¬ 
tic  place-value  systems  in  which  zero  was  represented 
by  an  empty  line  or  groove  or  by  an  empty  position 
on  the  line  or  groove.  Yet  when  these  same  people 
calculated  without  mechanical  aids,  they  used  clumsy 
notational  systems  lacking  both  plaoe  values  and 
zeros.  It  took  a  long  time  [from  1202  to  the  16th 
century)  ...  to  realize  that  in  writing  numbers  effi¬ 
ciently  it  is  necessary  to  draw  a  symbol  to  indicate 
th8t  a  place  in  the  number  symbolizes  nothing. 

...  In  son*  European  countries  calculating  by 
'algorism'  actually  was  forbidden  by  law,  so  that  it 
had  to  be  done  in  secret.  There  was  opposition  to  it 
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even  in  some  Arabic  countries.  Not  until  paper 
became  plentiful  in  the  16th  century  did  the  new 
notation  final*  win  out,  and  soon  after  that  the 
shapes  of  the  1 J  digits  became  standardized  because 
of  printing. 

The  corresponding  need  today  for  simplified  chemical 
descriptions  should  become  obvious  with  just  three  rela¬ 
tively  simple  statements,  but  chemists  -  like  all  humans  - 
continue  to  overlook  the  obvious: 

(1)  all  chemical  information  has  a  cosmic  common 
denominator  -  the  sharply  defined  atom-to-atom 
structure  descriptions; 

(2)  there  are  some  4.000,000  such  reported  struc¬ 
tures  in  the  chemical  world  -  needing  concise 
computer  descriptions  for  their  efficient  retrieval; 
and 

(3)  the  most  frequently  used  atomic  symbols  and 
groups  should  be  single-mark,  symbols. 


(1861-1868),  simpler  and  more  compact  linear  expressions 
replaced  the  two-dimensional  diagrams  in  journal  discus¬ 
sions:  the  2-carbon  ''ethyl"  chain  was  contracted  to 
CHj.CHj-  or  CH3CH}-  or  CjHs-  or  simply  ft  marks.  The 
corresponding  "acetyl”  group  was  simplified  toCHj.CO.- 
or  CHjCO-  or  simply  Ac  marks.  Thus  to  this  day  ethyl 
alcohol  is  frequently  symbolized  as  EtOH,  acetic  acid  as 
AcOH,  and  ethyl  acetate  as  EtOAc.  The  corresponding 
"new"  notations  Q2,  QV1,  and  20V1  give  even  more 
concise  descriptions,  with  simpler  typography  and  more 
logical  (language-free)  sets  of  symbols. 

Comparing  Old  and  Now 

Table  1  compares  these  names,  old  line-formulas,  and 
new  notations  with  those  of  other  related  and  important 

Table  1.  UNBRANCHED  OPEN-CHAIN  COMPOUNDS 

NAME  OLD  LINE-FORMULA  MEW  NOTATION 


This  last  point  was  made  157  years  ago  by  J.J.  Ber¬ 
zelius,  "the  organizer  of  chemistry"  and  editor  of  many 
pioneering  chemical  journals.  But  his  point  was  soon  for¬ 
gotten.  Computers  can  help  the  chemists  far  more  if  the 
chemists  recognize  and  provide  a  notation  that  reflects 
overall  "least  effort"  (in  the  long- term  view!).  Least  effort 
implies  being  easy  to  learn,  to  read,  to  write,  and  to 
remember  —  easy  to  use  in  every  man/machine  aspect. 


Line-Formula  Natations 

The  occasion  for  writing  the  "Empty  Column"  parable 
was  an  internationally  publicized  development  -  the  search 
for  an  international  chemical  notation  by  a  "Commission 
on  Codification,  Ciphering,  and  Punched  Card  Techniques,” 
established  in  1947  by  the  International  Union  of  Pure  and 
Applied  Chemistry  (IUPAC).  In  1949  the  author  had  been 
appointed  to  serve  in  what  then  was  called  the  "Punched 
Card  Committee"  of  the  American  Chemical  Society;  he 
wrote  this  parable  a  year  later  (May  1950)  as  a  needed 
preface  to  his  proposed  standardization  of  "line-formula” 
structure  descriptions.  Chemists  had  been  using  "rational 
formulae"  or  "line  formulas”  as  delineated  structure  de¬ 
scriptions,  ever  since  the  age  of  Structural  Chemistry 
dawned  in  1861.  All  that  seemed  necessary  was  a  careful 
standardization  tor  tabulating  equipment  (and  today's  com¬ 
puters)  of  this  world-wide,  time-tested  tradition.  The  para¬ 
ble  was  written  as  a  caution  to  the  IUPAC  and  other 
examiners  that  any  new  notation  may  have  a  strange  and 
puzzling  appearance  at  first  glance. 

Cosmic  Identification 

Line-formula  notations  developed  in  a  simple  and  natu¬ 
ral  way  that  most  chemistry  accounts  overlook;  so  a  few 
explanatory  figures  and  historic  examples  seem  appropriate 
here  The  cosmic  identification  of  a  chemical  compound  is 
its  structural  (or  constitutional  or  "rational")  formula  -  a 
two-dimensional  diagram  showing  how  all  the  atoms  in  a 
molecule  are  connected.  Thus  the  three  structure  diagrams 
in  Figure  A  not  only  explain  "rationally"  what  the  sub¬ 
stances  are  -  they  also  explain  how  ethyl  acetate  can  be 
hydrolyzed  (split  apart  by  the  addition  of  H-O-H  or  water 
and  suitable  catalyst)  to  ethyl  alcohol  and  acetic  acid,  or 
how  the  alcohol  and  acid  combine  to  form  the  attar  with  a 
suitable  dehydrating  agent. 

The  corresponding  "new”  notations  (introduced  with 
the  parable  20  years  ago)  are  given  under  the  names  in 
Figure  A.  These  notations  reflect  a  natural  reduction  in 
writing  effort  that  started  almost  as  soon  as  structure 
diagrams  appeared  Thus  within  a  brief  seven-year  period 


ace  tone 

CHj-CO.CHj 

IV1 

ethyl  ether 

CzHj-O-CjHj 

202 

ethyl  acetate 

c2h5-o-co.ch3 

20  V 1 

butyl  acetate 

CHjCH2CH2CH2-0-C0.CHj 

S0VI 

ethyl  alcohol 

CHjCH2-0H 

Q2 

acetic  acid 

CMj-CO.OH 

QVI 

carbonic  acid 

H0-C0.0H 

QVQ 

ethylamine 

CHjCH2-MH2 

22 

acetamide 

CHj-CO.MHj 

ZV1 

urea 

nh2-co.nh2 

ZVZ 

Sote:  the  period  in  the  CO-groupa  denotes  the  end 
of  a  doubly-bonded  or  :0  tide  group,  distin¬ 
guishing  this  from  an  - 0 -  link. 

compounds.  The  structure  diagrams  for  the  hydrocarbon 
fragments  are  like  those  shown  in  Figure  A.  An  amateur 
code-breaker  can  see  at  a  glance  that  analogous  things  have 
analogous  notation  symbols:  numerals  denote  the  number 
of  carbon  atoms  in  the  hydrocarbon  chains,  and  letters 
denote  "functional"  groups  that  characterize  the  chemical 
types.  For  example,  alcohols  have  the  lone  -OH  or  Q-termi- 
nal,  ethers  the  lone  -O-  link,  and  ketones  the  lone  -CO.-  or 
-V-  link;  apids  have  the  -CO.OH  or  -VQ  combination,  and 
esters  the  -O-CO  -  or  -O V-  combination.  (The  period  in 
-CO.-  denotes  the  end  of  the  ;0  side  group,  distinguishing  it 
clearly  from  the  connecting  -O-  link.) 

Nitrogen  analogs  of  alcohols  and  acids  also  have  nota¬ 
tions  that  show  more  direct  similarities  than  the  corre¬ 
sponding  (unspaced  amine  and  amide)  names.  The  appro¬ 
priate  pairs  in  Table  1  are  those  in  which  the  terminal  -OH 
or  Q-group  is  replaced  by  a  -NHj  or  Z -group:  Q2  and  Z2, 
QV1  and  ZV1 ,  QVQ  and  ZVZ. 
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Figure  A. 


Branched  Structure 

The  first  branched  structure  in  Figure  8,  copied  from  an 
1866  report,  shows  how  naturally  the  line-formula  conven¬ 
tion  arose  as  a  one-dimensional  printing  simplification  of 
two-dimensional  structure  diagrams.  At  that  time  the  "car¬ 
bon  skeleton"  usually  was  drawn  vertically,  like  the  human 
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Figure  B. 

skeleton,  but  with  a//  of  the  "appendages"  extending  to  the 
right  (and  in  more  compact  groups  than  those  shown  in 
Figure  A).  Thus  the  line-formula  delineation  of  these 
compacted  groups  is  simply  a  television-like  scanning  of  the 
two-dimensional  diagram  -  left  to  right  and  top  to  bottom. 
This  illustrated  notation  introduces  two  new  features,  a 
terminal  VH-group  for  the  top  aldehyde  or  CHO-group,  and 
a  Y-symbol  for  the  Y-branched  or  ternary  carbon  (attached 
to  three  atoms  other  than  hydrogen).  This  branching 
distinction  is  a  very  important  "connection  table"  specifica¬ 
tion.  The  Unking  -CH2CHj -group  is  denoted  simply  as  a 
2-carbon  chain,  without  the  extra  H-atom  that  the  corre¬ 
sponding  terminal  chain  must  have. 

Citric  acid,  the  second  example  in  Figure  B,  illustrates  a 
typical  partial  compacting  of  the  pictured  groups;  the 
reduced  cluttering  of  lines  emphasizes  the  distinct  X- 
branching  nature  of  the  central  carbon  atom;  hence  the 
X-symbol  denotes  a  quaternary  carbon  (attached  to  four 
atoms  other  than  hydrogen). 

Chloropicrin,  the  third  example  in  Figure  B.  also  illus¬ 
trates  an  X -branched  carbon  and  two  other  new  features. 
(1)  a  tingle  G-mark  "fusion"  of  the  Cl  symbol  for  the  very 
frequently  cited  chlorine  atoms;  and  (2)  a  branched  dioxy¬ 
gen  group,  important  enough  to  be  denoted  by  a  single¬ 
letter  W  (its  "double-U"  name  alludes  to  the  two  double¬ 
bond  connections  seen  in  most  branched  dioxygen  struc¬ 
tures). 


The  R-mark 

Three  graphically  distinct  kinds  of  benzene  derivatives 
are  illustrated  in  Figure  C.  All  have  a  characteristic  regular- 
hexagonal  C«-ring  that  is  more  prominent  in  chemical 
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Figure  C. 


catalogs  than  all  other  rings  combined.  Accordingly  this 
ring  is  denoted  most  efficiently  as  a  single  mark  -  the  letter 
fl  (for  Ring)  -  and  subordinated  to  all  other  atomic-group 
symbols  because  of  its  superprominence  This  R-mark  saves 
more  writing  effort  than  any  other  notation  mark  (reflect 
ing  traditional  abbreviations  Ph  or  the  "phi"  sign  <t>  for  the 
phenyl  or  C6H5 -group),  it  also  eliminates  the  graphical 
need  to  show  the  ring-forming  connections  as  alternating  or 
"resonating"  single  and  double  bonds,  often  called  aromatic 
bonds  to  distinguish  them  from  the  quite  different  open 
chain  double  bonds. 

Styrene,  the  first  example  in  Figure  C,  illustrates  the 
open-chain  kind  of  double  bond,  an  unsaturation  -  hence 
denoted  with  the  letter  U.  These  groups  are  so  active  that 
they  will  spontaneously  link  together,  forming  the  satu¬ 
rated  chains  of  polystyrene,  with  a  C6H5  or  phenyl  side 
group  on  every  other  chain  atom  Many  other  phenyl  or 
C6HS -derivatives  (with  only  one  replaced  H-atom).  like 
styrene,  have  structurally  unrevealing  names  and  pictorially 
direct  notations.  A  few  of  these  many  examples  are  listed  in 
Table  2. 

Table  2.  COMMON  PHENYL  DERIVATIVES 


NAHE 

OLD  LINE-FORMULA 

NEW  NOTATION 

antsole 

c6h5-o-ch3 

I0R 

toluene 

VS*CH, 

IR 

styrene 

C6yCH:CH2 

IUIR 

phenol 

c6h5-°h 

HR 

benzoic  acid 

CgHj-CO.OH 

QVR 

nitrobenzene 

c6h5-no2 

WNR 

aniline 

c6VNH2 

ZR 

Note:  The  Cgi!$-ring  fragment  is  frequently  denoted 
as  Ph_  or  0  (phi) .  In  the  neu  notation,  the 
ZENO  mark  is  slashed  as  a  0  mark. 


Aspirin,  the  second  example  in  Figure  C,  appropriately 
shows  the  "empty  column"  solution  to  what  is  a  real 
headache  in  many  other  chemical  notations:  the  need  for  a 
logically  distinct  set  of  symbols  to  locate  ring  positions.  In 
1866  Kekul6  used  lower  case  letters  for  this  purpose,  so  in 
1950  his  meaning  was  put  into  "Teletype"  equivalents  by 
prefixing  each  locant  letter  with  a  blank  space. 

TNT,  the  third  example,  illustrates  how  this  spaced 
locant  alone  suffices  when  the  located  group  is  the  com¬ 
monplace  methyl  group  or  unit-carbon  chain 

Speot  et  a  Mathematical  Operator 

The  "Empty  Column"  thus  seemed  an  appropriate  title 
for  the  parable  because  a  corresponding  "empty  "  or  blank 
space  is  an  essential  and  unique  part  of  the  notation  that  it 
prefaced  this  SPACE  serves  as  a  mathematical  opeiator  or 
shift  key  to  convey  lower  case  meaning  to  the  letter  that 
follows,  and  all  such  LOwer  CAse  letTfrs  LOCATE  ring 
positions.  This  spaced  "locant"  also  begins  a  new  unit  of 
information,  mentally  translating  to  mean  "and  at  this  ring 
location  the  following  atomic  group  is  attached  ."  Thus  in 
addition  to  the  gain  of  a  doubled  keyboard  without  a 
penny  cf  cost,  the  heavily  used  spaces  facilitate  manual 
reading,  like  the  spaces  between  words.  Similarly  spaced 
numerals  also  give  them  distinct  meaning  as  multipliers  of 
the  preceding  string  of  symbols,  these  operate  like  a  "Polish 
string  notation"  in  omitting  the  need  for  quantity-enclosing 
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marks,  which  were  not  available  in  1950-vintage  tabulating 
equipment. 

Other  notation  designers  overlooked  this  obviously 
profitable  use  of  a  "blank  space"  character,  but  that  is  not 
surprising  to  historians:  the  Greeks  and  Romans,  for  all 
their  intelligence, 

rantheirwordstogetherlikethisbecausetheydidnot 
realizethat  SPACES  greatly  facilitated  the  reading 
thereof!  This  spacing  of  words  also  was  a  medieval 
discovery , 

In  1950  there  were  no  punctuation  marks  available 
other  than  the  ampersand,  which  has  served  well  ever  since 
then  to  end  side  groups  other  than  the  few  that  are  strictly 
terminal  by  definition  (like  the  illustrated  G,  H  and  Q 
marks).  Notations  for  all  ring  structures  other  than  the 
C4  -hexagon  of  benzene  ideally  were  enclosed  in  parenthe¬ 
ses,  and  the  1950  letter-substitutes  for  carbocycWc  ring 
notations  were  inspired  from  an  1866  diagram.  In  that  year 
Emil  Erlenmeyer  (the  flask  man)  tried  to  explain  the 
two-ring  structure  of  naphthalene  with  the  diagram  shown 
in  Figure  D.  His  L-shaped  and  J-shaped  marks  indicated  a 
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connecting  line  between  those  carbon  atoms;  this  suggested 
the  use  of  L.  .  J  marks  to  "enclose"  carbocyclic  (including 
aiicyclic)  ring-descriptions,  and  T. .  .J  to  enclose  he7ero- 
cvclic  equivalents.  Rings  in  general  can  have  so  many 
topological  complications  that  it  is  not  possible  to  sum¬ 
marize  other  details  here.  Vitamins  B6  in  Figure  D  is  a 
heterocyclic  compound  of  average  complexity. 

"Connection  Table"  Specifications 

The  first  rule  of  this  "empty  column"  chemical  notation 
is  to  cite  chains  of  atomic  groups  in  end-to-end  connecting 
order,  following  the  line-formula  tradition  The  "least 
effort"  gain  is  that  no  search  has  to  be  made  for  some 
arbitrarily  preferred  "central  component,"  as  in  the  IUPAC 
notation,  and  no  related  "assembly  instructions”  are 
needed  for  the  pictorially  direct  attachments.  The  gain  in 
minimizing  "connection  table"  specifications  seems  so  obvi¬ 
ous  that  one  wonders  why  others  had  not  applied  this  same 
gain  in  complicated  ring  systems,  where  this  least-effort 
notation  follows  a  longest  possible  path  of  connections. 
This  maximized  path  thereby  minimizes  ideal  ring  descrip¬ 
tions  to  a  simple  recitation  of  the  nonconsecutive  links. 

The  second  rule  also  is  so  simple  and  obvious  that  it  was 
overlooked  until  this  line-formula  notation  appeared  in 
1950:  Resolve  all  otherwise  equal  alternatives  by  the  simple 
alpha-numeric  order  of  the  notation  symbols.  Long  after¬ 
ward,  this  proved  to  be  the  simplest  thing  a  computer  could 
do:  compare  "equals"  until  a  higher  or  lower  resolution  is 
reached!  Even  here,  intellectual  complications  have  become 
rooted;  thus  in  1950  the  notation  followed  the  seemingly 
natural  Hollerith-sorting  sequence  of  numbert  before  let¬ 
ter*.  (We  could  not  imagine  anyone  counting  his  peanuts  as 
A,  B,  C,  and  then  when  he  ran  out  of  letters,  going  to  1,2, 
3!)  The  I960  terminology  defined  the  letters  as  having 
higher  rank  than  the  numerals,  just  as  the  value  of  9  is 
higher  than  that  of  1.  The  notation's  rule  2  specified  a 
descending  citing  order  -  letters  before  numbers,  because 


in  open-chain  structures  the  letters  feature  the  character¬ 
istic  chemical  functions  like  acid,  alcohol  and  aldehyde, 
these  determine  the  properties  and  uses,  whereas  the  num¬ 
ber*  denote  the  number  of  carbon  atom s  in  the  relatively 
inactive  paraffin  chains  ( Par  affinis  means  low  affinity  or 
low  activity).  Thus  rule  2  tends  to  bring  together  chemical¬ 
ly  similar  things  like  open-chain  alcohols  in  simple,  alpha¬ 
betically  arranged  lists  like  those  in  Tables  1  and  2 

Pope  Paul  described  this  "least  effort"  aim  when  he 
advised  "Avoid  complicating  simple  things;  strive  to  simpli¬ 
fy  complicated  things." 

The  Character  Set 

The  "program  language"  of  this  chemist -oriented  nota¬ 
tion  is  best  illustrated,  not  with  more  recited  rules,  but  with 
a  summarizing  review  of  the  basic  descriptive  tools  -  the 
character  set.  If  these  are  well  chosen,  and  cited  in  pictorial¬ 
ly  direct  connecting  order,  the  rules  for  handling  them 
almost  come  naturally. 

Berzelius,  as  previously  noted,  gave  the  first  long-over¬ 
looked  requirement  for  citing  chemical  structures  with  least 
effort:  the  most  frequently  cited  atomic  groups  should  have 
tingle  marks.  Thus  in  1813  he  established  nine  perfect 
choices  for  the  very  frequently  cited  no/? metallic  atoms  of 
boron,  carbon,  fluorine,  hydrogen,  iodine,  nitrogen,  oxy¬ 
gen,  phosphorus,  and  sulfur.  His  apt  recommendations  can 
be  remembered  as  a  tic-tac-toe  that  appropriately  "begins 
with  BC."  "has  an  /  in  the  middle,”  and  appropriately 
"ends  with  a  PS."  (See  first  part  of  Figure  E.) 


C.JL 

HIM 

CPS 


M 


Figure  E. 


Bromine  was  not  yet  discovered  when  Berzelius  assigned 
B  for  boron.  Today  it  is  extracted  from  the  sea  in  ton-a-day 
plants  (to  make  lead-scavenging  gasoline  additives  like 
ethylene  dibromide),  so  the  ideal  jmg/eTetter  symbol  is 
"extracted"  from  the  front  part  of  the  Br  symbol.  An 
equally  obvious  clue  was  overlooked  until  some  ten  years 
ago,  when  a  Syracuse  University  student  showed  the  lecture 
audience  that  the  hinted  E  can  be  extracted  directly  from 
s£a! 

Chlorine  was  first  known  by  an  appropriately  frightening 
appellation  as  “dephlogisticated  muriatic  acid  gas",  so 
Berzelius  aptly  assigned  a  single  letter  U  for  the  muriatic 
radical  in  his  first  (1813)  list  of  atomic  symbols.  To  this 
day  the  Cl  replacement  continues  to  give  trouble  in  letter- 
number  ambiguities,  so  these  are  fused  into  a  i*np(e- letter 
G.  the  7th  latter  of  the  alphabet  for  the  leading  atom  in  the 
7th  Group  of  the  Periodic  System.  This  choice  is  triply 
appropriate  because  G  and  E  stand  next  to  each  other  in 
the  word  haloGfn  as  well  as  in  the  Periodic  Table.  The 
symbols  F,  H  and  I  combine  with  these  to  form  an 
alphabetically  closed  set,  with  obvious  indexing  advantages. 

Lengthening 

Berzelius  analyzed  the  importance  of  symbol  selections 
so  well  that  no  new  single-letter  symbols  need  to  be 
assigned  for  high-frequency  structural  atoms,  other  than  the 
above  E  and  G  for  bromine  and  chlorine  atoms.  However, 
he  and  his  followers  overlooked  an  obvious  gain  in  his 
original  intent  to  give#// metallic  atoms  two-letter  symbols, 
like  his  original  PO  for  potassium;  metallic  atoms  are  cited 
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much  less  frequently  than  nonmetallic  atoms,  and  there  are 
far  more  kinds  of  then  Trends  in  "least  effort"  usage  gave 
overlooked  clues  like  the  lengthening  (for  easier  recogni¬ 
tion)  of  L  to  Li  and  R  to  fih  for  the  rarely  used  lithium  and 
rhodium  symbols.  A  late  19th  century  Harvard  textbook  of 
chemistry  showed  a  more  helpful  Ur  instead  of  U  for 
uranium,  Va  instead  of  V  for  vanadium,  and  Wo  instead  of 
W  for  wolfram  or  tungsten.  A  more  recent  (1921)  textbook 
of  inorganic  chemistry  from  M.l.T,  and  other  reference 
books  of  that  period,  showed  Yt  instead  of  Y  for  the  really 
rare  yttrium  in  Periodic  Tables  Chemistry  students  would 
welcome  the  simplification  that  all  metals  have  two-letter 
symbols,  as  in  this  notation.  1  he  generalization  extended  to 
the  equally  rare  noble  gases,  tor  the  1954  notation  manual 
used  Ar  to  denote  argon  before  IUPAC  made  this  an  official 
international  atomic  symbol. 

Computer  Restrict ioni 

What  happens  when  these  two-letter  symbols  must  be 
written  with  the  "Teletype"  and  computer  restrictions  of 
strictly  upper-case  letters?  Here  another  aid  to  recognition 
was  overlooked  and  insufficiently  generalized  in  the  original 
1954  manual:  All  two-letter  symbols  now  are  set  off  in 
hyphens  Then  the  computer  chemistry  can  wax  poetic  and 
show  that  -AR-  pairs  with  -KR-  in  Periodic  Group  8,  -KA- 
with  -NA-  (after  Latin  kalium  and  natrium )  in  Group  1 ,  -VA- 
with  -TA-  in  Group  5.  and  two  for  good  measure  in  Group 
6:  -UR-  with  -CR-  and  -WO-  as  a  "spitting  image" of  -MO- 
Rare  -YT-  matches  the  rare  earth  -YB-  in  Group  3 

The  hyphenation  intensifies  recognition  in  printed  lists, 
and  the  two-letter  standardization  releases  six  precious 
single  letters  for  nonmetallic  structural  groups,  most  of 
them  cited  more  frequently  than  the  previously  introduced 
£  and  G. 

Astronauts  as  well  as  aquanauts  now  behold  the  beauty 
of  our  water-covered  blue  earth.  The  OH-group  always  had 
great  prominence  in  AQUEOUS  chemistry,  and  now  it  has 
cosmic  prominenoe  as  a  free  OH  radical  in  outer  space.  The 
obvious  single-letter  choice  for  this  very  important  group  is 
extracted  from  pure  or  polluted  AQUA,  and  this  old  letter 
O  can  be  well  remembered  as  an  O-atom  with  an  H-tail 
(Figure  E. center). 

(Old  radioman-practice  slashes  the  zero,  not  the  fre¬ 
quently  used  letter  O).  ' 

Nitrogen  chemistry  parallels  oxygen  chemistry  in  many 
ways,  but  this  can  be  shown  more  refreshingly  with  a 
programming  aim  to  have  the  important  NH-group  match 
the  OH-group  in  retrieval  sharpness.  The  notation  symbol 
for  this  linking  or  Mid-aMino  NH-group  is  carefully  selected 
from  the  middle  of  the  alphabet:  the  nitrogen  counterpart 
for  O  is  the  letter  M,  an  N-atom  with  an  H-prop  (Figure  E). 


Computer  Retrieval 

Carbon,  of  course,  is  the  characteristic  element  of  the 
organic  compounds  that  comprise  some  94%  of  the 
4.000X100  reported  chemicals;  and  carbon  atoms  ere  found 
among  these  structures  far  more  frequently  than  any  other 
atoms  excepting  the  stellar-wide  hydrogen  atoms,  more 
frequently,  in  fact,  than  all  others  combined.  Thus  good 
computer  retrieval  requires  distinctive  single-letter  symbols 
for  the  different  kinds  of  combined  carbon.  The  obviously 
best  choice  for  an  X -branched  carbon  atom  in  open-chain 
structures  is  the  letter  X  (denoting  a  quaternary  carbon,  or 
one  connected  to  four  non-hydrogen  atoms).  Its  quaternary 
nitrogen  parallel  is  denoted  with  the  letter  K,  the  character¬ 
istic  feature  of  "kwat"  and  "kationik"  salts.  The  Y- 
branched  CH-group  likewise  is  best  denoted  with  the  letter 
Y  (a  carbon  or  CH-group  attached  to  three  non-hydrogen 
atoms).  A  related  Very  common  diValent  connective,  the 


•CO  -group,  is  denoted  with  the  letter  V  (first  part  of  Figure 

F). 

Roman  stone-masons  made  a  "least -effort"  V-cut  for  the 
vowel  U.  (Its  F-related  meaning  was  a  later  medieval 
addition  to  the  alphabet.)  Thus  the  V-group  has  within 
itself  an  etomologically  related  U-mark,  elsewhere  used  for 
the  Unsaturating  double-bond  link.  This  notation  gives 
considerable  freedom  from  chemical  bondage,  because  the 
unsaturating  symbol  U  is  used  only  when  it  is  necessary  to 
show  the  corresponding  physical  removal  of  H-atoms  from 
the  connected  carbon  group  (as  in  the  previously  illustrated 
and  listed  styrene.  1U1R).  A  third  related  letter  W  was 
chosen  to  denote  the  branched  dioxygen  part  of  nitro  and 
analogous  02 -groups,  because  it  literally  whispers  its  em¬ 
bedded  double-U  bonding  pattern!  (Figure  F).  The  letter  W 
also  was  a  medieval  addition  to  the  English  alphabet, 
designed  to  represent  the  "UU"  or  long  "ooo"  vowel 
sound,  hence  its  double-u  name. 


Figure  F. 

Since  the  benzene  ring  occurs  more  frequently  in  struc¬ 
ture  descriptions  than  all  other  rings  combined  (including 
benzo-fused  rings  with  the  others),  the  most  appropriate 
remaining  letter  selection  for  this  Resonating,  Regular- 
hexagonal  Ring  therefore  is  the  letter  R,  visualized  as  in 
Figure  F  with  two  adjacent  (or  ortho )  attachments.  The 
enclosed  circle  in  this  diagram  is  the  logical  "least-effort" 
way  of  showing  the  "resonating"  or  alternating  double 
bonds.  (The  author  was  circling  his  benzene  rings  in  this 
"lazy-boy"  manner  some  35  years  ago,  so  it  is  hardly  a 
modern  innovation.) 

Mnemonic  Associations 

Fastidious  professors  may  feel  deeply  annoyed  by  the 
mnemonic  associations  in  these  single-letter  selections,  and 
they  are  not  likely  to  be  "turned”  by  the  last  of  these 
"dirty  dozen"  memorizing  irritants:  the  terminal  NH2- 
group  in  this  notation  is  denoted  with  the  terminal  letter  Z 
(from  aZine  and  hydraZine).  a  doubly  appropriate  selection 
because  it  is  pictorially  the  very  same  as  the  letter  N  turned 
on  end  (end  of  Figure  F  and  end  of  the  program-language 
remarks!). 

Perhaps  the  best  way  to  emphasize  and  summarize  this 
"Empty  Column"  lesson  about  resisting  change  is  a  brief 
recitation  of  what  other  users  -thousands  of  miles  away  - 
have  done  with  this  chemical  notation  in  spite  of  its 
officially  unrecognized  st8tus.  About  ten  years  ago  the 
users  simplified  its  identification;  people  have  endless  d'ffi 
culty  with  this  three-syll8ble,  nine-letter  WISWESSER  word 
(a  iifelong  lesson  to  its  bearer),  so  they  spe^k  of  the 
W  swesser  Line  Notation  simply  as  the  WLN. 

This  WLN  now  has  an  "authorized  manual,"  voluntarily 
written  by  Elbert  G.  Smith  (Professor  of  Chemistry  at  Mills 
College  in  Oakland,  California)  and  published  by  McGraw- 
Hill  in  1968  -  after  eight  years  of  rule  revisions  and 
user-tested  improvements  All  royalties  from  this  book  go 
to  a  Chemical  Notation  Association,  organized  in  1965  "(1) 
to  promote  and  conduct  research  in  the  field  of  chemical 
notation  systems  and  to  advanoe  the  development  and 
application  of  these  systems,  (2)  to  educate  chemists  in  the 
uses  and  advantages  of  these  systems;  and  (3)  to  act  as  an 
official  adjudicating  body  to  determine  and  control  the 
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standard  rules  of  any  chemical  notation  system  entrusted  to 
this  Association  for  this  purpose  by  its  authors,  inventors, 
and  developers  "  The  70-some  members  of  this  Association 
in  the  United  States,  the  United  Kingdom,  and  France  are 
still  concerned  with  only  one  notation  the  WLN. 

The  appendix  contains  a  partial  list  of  organizations  that 
have  put  an  investing  interest  in  WLN,  as  evidenced  by 
publications  Programs  based  on  this  standardized  line- 
formula  notation  now  have  daily  usage  in  IBM  360  or  1130, 
Burroughs  6500,  Honeywell  200  or  400,  CDC  3150,  GE 
635,  POP- 10  and  other  computers  in  chemical  information 
centers  throughout  the  world. 

Dow’s  CHECKER  Program 

One  pioneering  program,  known  as  Dow's  CHECKER 
program,  calculates  a  molecular  formula  from  the  notation 
and  compares  this  with  manually  calculated  input  formula; 
notation  errors  are  about  2%  and  formula  errors  are  the 
same  order  of  magnitude  -  around  2%. 

"WLN-per  muting  programs"  identify  another  series  of 
routines  for  IBM,  Burroughs,  UNIVAC,  Honeywell,  and  GE 
computers  These  programs  "permute"  or  rotate  the  nota¬ 
tion  records  such  that  the  repeatedly  offset  atomic  symbols 
form  a  "key-letter-in-context"  alphabetized  list.  Copies  of 
this  v.'LN-permuting  routine  have  passed  around  at  least  a 
half-dozen  computer  centers  in  the  United  States. 

Imperial  Chemical  Industries,  Ltd.  have  CROSSBOW 
programs  that  generate  three  possible  outputs  from  input 
WLN  records:  (1)  connection  tables  for  fine  structure 
searching,  (2)  open-ended  "fragmentation  codes",  which 
are  chemically  significant  structural  components  that  can 
be  printed  as  WLN  symbol  clusters  and  organized  into  file 
records,  and  (3)  computer-generated  and  high-speed 
printed-composed  structure  diagrams.  Complementing  pro¬ 
grams  elsewhere  are  now  in  process  to  yield  computer- 
composed  notations  from  input  tapes  of  connection  tables, 
or  from  hand-drawn  diagrams  made  with  light-pen  com¬ 
munication  by  a  clerk  or  chemist  at  the  console. 


The  PATHFINDER  Program 

The  PATHFINDER  program,  written  for  Dow’s  Bur¬ 
roughs  5500  computer,  is  a  very  powerful  routine  that  ex¬ 
haustively  checks  all  trial  paths  in  extremely  complicated 
ring  structures,  holding  the  correct  lower -valued  choice  in  all 
comparisons;  the  final  holding  is  converted  to  the  infallibly 
correct  carbocyclic  notation.  Its  input  is  our  long-over¬ 
looked  “nonconsecutive  links". 

Binary  "bit  screens"  can  be  searched  at  phenomenal 
speed,  compared  with  higher -language  alternatives  that 
suffer  much  input/output  processing  translation.  A  com¬ 
puter-generated  equivalent  of  the  1950-vintage  multi- 
punched  cards  makes  binary  "scratches"  for  the  distinc¬ 
tively  spaced  or  unspaced  WLN  symbols  and  yields  a 
30-fold  increase  in  speed  in  sophisticated  chemical  structure 
searches. 

These  are  just  starting  examples  of  computer  benefits 
that  the  chemical  world  will  enjoy  when  more  manpower, 
money  and  talented  attention  is  devoted  to  this  20-year -old 
chemical  notation  with  the  empty  columns. 

The  "least  effort"  advantages  of  the  author's  proposed 
"Line  Formula  Chemical  Notation"  were  not  acknowledged 
at  the  decisive  meeting  by  representatives  of  IUPAC  and  the 
ACS  Punched  Card  Committee,  held  at  M.l.T.  in  August 
1951  the  IUPAC  examiners  decided  to  "give  the  axe  to  the 
line-formula  tradition"  and  favored  an  unfamiliar  departure 
that  has  a  more  complicated  set  of  resolving  rules  and  a 
much  more  complicated  character  set.  It  has  two  or  three 
known  users  in  the  chemical  world  today,  in  spite  of  a 


number  of  official  promotional  efforts  by  the  IUPAC 
authorities. 

Like  the  Arabian  mathematician  in  the  parable,  we  can 
only  guess  why  we  failed  to  interest  official  examiners  at 
M.l.T.  in  1951  -  and  elsewhere  since  then.  Perhaps  the 
simplest  and  most  obvious  solutions  to  complicated  prob¬ 
lems  are  the  most  easily  overlooked.  The  power  of  the 
human  brain  to  deceive  itself  -  even  when  healthy  and  free 
of  disabling  drugs  -  must  not  be  underestimated.  We 
submit  the  comparisons  listed  in  Table  3  of  century-old  line 
formulas  and  their  standardized  WLN  equivalents,  for  those 
who  wish  to  see  the  conservative  correspondence  with 
tradition. 

Table  3.  COMPARISON  OF  EARLY  LINE  FORMULAS 
<1861-18671  WITH  WLN 


1861-1867  Line  Formula 


W0’C3V)’0,C2H5 

CHjCN.C0.Br 

CICH2-C02H 

H  N-CH  -CH  -C0-H 
2  2  2  2 

CHCI2.CCIj 

C0.0M-C(0H)2-C0.0H 

CHj.CHI.COOH 

CHj-CH.0H-C0.0H 

C  H  -CH  -Br 
6  5  2 

CjHj.CdjH 

C6HBrHBrHH02 

C,H,.S0,.0H 
o  5  2 


WIN 
20V202 
NC1VE 
QVIG 
22  V<2 
GYGXGGG 
QVXdQVQ 
QVY I 

QYvg 
El  R 
GYGR 

WNR  CE  EE 
WSQR 


Note:  The  CB  2- groups  attached  to  X -branched  C-atom 
are  understood  by  definition  of  the  S  mark 
(or  X). 


The  last  cited  report  on  computer  applications  of  the 
WLN  (44)  gives  in  its  appendix  some  500  additional 
examples,  all  identified  by  common  name;  most  of  them 
are  grouped  into  18  sets,  sequenced  in  increasing  ^rder  of 
structural  complexity.  The  chronological  arrangement  of 
the  71  reference  citations  in  this  same  report  also  reflects 
the  "exponential"  growth  of  user  interest  in  the  WLN:  only 
twelve  references  appeared  in  the  first  ten  years 
(1950-1959),  then  ten  in  the  next  five  years  (1960-1964), 
followed  by  twelve  in  two  veers  (1965-1966),  seven  in 
1967,  and  no  less  than  nineteen  in  1968.  This  is  gratifying 
growth! 

We  acknowledge  the  growing  signs  of  user  interest  in  the 
WLN  as  a  keen  appreciation  of  their  interest,  and  we  submit 
this  "excursion  in  symbol-land"  as  special  thanks  to  Com¬ 
puters  and  Automation  for  recognition  of  the  parable 
that  was  written  in  1950  to  introduce  our  "empty  column" 
notation.  □ 


APPENDIX 

A  partial  list  of  organization!  that  hava  expreetad  an  Intaratt  in 
tha  WLN  and  publishad  or  praaantad  papart  on  it,  la  given  batow. 
Thair  raporti  are  keyed  to  the  number!  In  tha  literature  reference! 
(which  alio  include  the  earliest  citt  tlone  on  the  WLN). 

J.  T.  Baker  Chemical  Co,  12,  3, 26, 43) 

Chemical  Abstract!  Service  (01 
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Cwmond -Shamrock  Corporation  (13.  14,  26) 

Dow  Chemical  Company  (7.  8.  26) 

Food  &  Drug  Administration  (1) 

GAF  Corporation  (35) 

Goodyear  Tire  &  Rubber  Co  (10) 

Hebrew  University  (Israel)  (19) 

Hoffmann-LaRoche.  Inc.  (26.  34) 

Imperial  Chemical  Indc  r*es  (21 . 22.26,  ~  ' 

Instiu  **  for  Scientific  In.  mation  (23,  28) 

Eli  Lilly  and  Company  (29) 

Mills  College  (E.  G.  Smith)  <31.  $2,  33) 

Ministry  o<  Defense  of  Israel  (27) 

National  Bureau  of  Standards  (111 
National  Library  of  Medicine  ( 1 , 301 
Olin  Mathieson  Corporation  (181 
G.  D.  Searle  &  Co..  Inc.  (5. 6i 
Stanford  Research  Institute  (20, 26) 

University  of  Pennsylvania  (24, 26) 

University  of  Sheffield  (UK)  (25) 

U.  S.  Army,  CIDS  Program  (26) 

U.  S.  Army .  Edge-wood  Arsenal  ILO  (12, 15,  16,  17, 26) 

U  S.  Army.  Fort  Detrick  (2,  3,  21 . 26,  42, 43,  44) 
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